Model Selection

Multimodal LLM

# Multimodal LLM

Slowfast Video Mllm Qwen2 7b Convnext 576 Frame96 S1t6

Adopts an innovative slow-fast architecture to balance temporal resolution and spatial details in video understanding, overcoming the sequence length limitations of traditional large language models.

Slowfast Video Mllm Qwen2 7b Convnext 576 Frame64 S1t4

A video multimodal large language model using a slow-fast architecture, balancing temporal resolution and spatial details, supporting 64-frame video understanding

Mini Ichigo Llama3.2 3B S Instruct

A multimodal language model based on the Llama-3 architecture, natively supporting audio and text input comprehension, focusing on enhancing large language models' understanding of audio.

Text-to-Audio English

Videollm Online 8b V1plus

VideoLLM-online is a multimodal large language model based on Llama-3-8B-Instruct, focusing on online video understanding and video-text generation tasks.

Video-to-Text English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase